Labelled network subgraphs reveal stylistic subtleties in written texts

نویسندگان

Vanessa Q. Marinho

Graeme Hirst

Diego R. Amancio

چکیده

The vast amount of data and increase of computational capacity have allowed the analysis of texts from several perspectives, including the representation of texts as complex networks. Nodes of the network represent the words, and edges represent some relationship, usually word co-occurrence. Even though networked representations have been applied to study some tasks, such approaches are not usually combined with traditional models relying upon statistical paradigms. Because networked models are able to grasp textual patterns, we devised a hybrid classifier, called labelled motifs, that combines the frequency of common words with small structures found in the topology of the network, known as motifs. Our approach is illustrated in two contexts, authorship attribution and translationese identification. In the former, a set of novels written by different authors is analyzed. To identify translationese, texts from the Canadian Hansard and the European parliament were classified as to original and translated instances. Our results suggest that labelled motifs are able to represent texts and it should be further explored in other tasks, such as the analysis of text complexity, language proficiency, and machine translation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Authorship recognition via fluctuation analysis of network topology and word intermittency

Statistical methods have been widely employed in many practical natural language processing applications. More specifically, complex networks concepts and methods from dynamical systems theory have been successfully applied to recognize stylistic patterns in written texts. Despite the large amount of studies devoted to represent texts with physical models, only a few studies have assessed the r...

متن کامل

Stylistic Changes for Temporal Text Classification

This paper investigates stylistic changes in a set of Portuguese historical texts ranging from the 17 to the early 20 century and presents a supervised method to classify them per century. Four stylistic features – average sentence length (ASL), average word length (AWL), lexical density (LD), and lexical richness (LR) – were automatically extracted for each sub-corpus. The initial analysis of ...

متن کامل

Elimination of the Elements of the Sentense in Sahife-ye-Shahi Book

Language always goes forward the brevity way, which means trying to convey its intentions by using the least number of words.The consequence of this process is contingencies such as deletion of sentence components. Poets and writers sometimes omitted some of the components of the word in order to summarize the word and, of course, to observe the principles of rhetoric, punctilios and syntactic ...

متن کامل

Ordinal measures in authorship identification∗

The goal of this paper is to compare a set of distance/similarity measures, regarding theirs ability to reflect stylistic similarity between authors and texts. To assess the ability of these distance/similarity functions to capture stylistic similarity between texts, we tested them in one of the most frequently employed multivariate statistical analysis settings: cluster analysis. The experimen...

متن کامل

Ensuring Stylistic Congruity in Collaboratively Written Text: Requirements Analysis and Design Issues by

Often, texts that have been written collaboratively do not \speak with a single voice." Eliminating stylistic incongruity, a di cult undertaking for both collaborative and singular writers, is the desired function of a software tool. This thesis describes the rst cycle of an iterative software development process towards meeting this goal. The user requirements are analyzed with respect to a mo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Labelled network subgraphs reveal stylistic subtleties in written texts

نویسندگان

چکیده

منابع مشابه

Authorship recognition via fluctuation analysis of network topology and word intermittency

Stylistic Changes for Temporal Text Classification

Elimination of the Elements of the Sentense in Sahife-ye-Shahi Book

Ordinal measures in authorship identification∗

Ensuring Stylistic Congruity in Collaboratively Written Text: Requirements Analysis and Design Issues by

عنوان ژورنال:

اشتراک گذاری